Mining MEDLINE: Abstracts, Sentences, or Phrases?
نویسندگان
چکیده
A growing body of works address automated mining of biochemical knowledge from digital repositories of scientific literature, such as MEDLINE. Some of these works use abstracts as the unit of text from which to extract facts. Others use sentences for this purpose, while still others use phrases. Here we compare abstracts, sentences, and phrases in MEDLINE using the standard information retrieval performance measures of recall, precision, and effectiveness, for the task of mining interactions among biochemical terms based on term co-occurrence. Results show statistically significant differences that can impact the choice of text unit.
منابع مشابه
Pacific Symposium on Biocomputing 7:326-337 (2002). MINING MEDLINE: ABSTRACTS, SENTENCES, OR PHRASES?
s within occurring B and A between ns interactio of # unit text of type a within occurring B and A between ns interactio of # recall = where A and B are query terms or their synonyms. Intuitively, recall here measures the capacity of a given text unit to contain the interactions present in MEDLINE abstracts. Any interaction described within a particular text unit is also described within all la...
متن کاملFinding Cue Expressions for Knowledge Extraction from Scientific Text: Early Results
This paper investigates whether and how natural language processing and data mining techniques can be utilized for locating desired knowledge in a large text collection. This task amounts to finding cue words and phrases indicating the location of knowledge, where the challenge is to establish a methodology that can cope with the diversity of expressions. We examine the feasibility of mining cu...
متن کاملProceedings of the Pacific Knowledge Acquisition Workshop 2004
This paper investigates whether and how natural language processing and data mining techniques can be utilized for locating desired knowledge in a large text collection. This task amounts to finding cue words and phrases indicating the location of knowledge, where the challenge is to establish a methodology that can cope with the diversity of expressions. We examine the feasibility of mining cu...
متن کاملIdentifying Sections in Scientific Abstracts using Conditional Random Fields
OBJECTIVE: The prior knowledge about the rhetorical structure of scientific abstracts is useful for various text-mining tasks such as information extraction, information retrieval, and automatic summarization. This paper presents a novel approach to categorize sentences in scientific abstracts into four sections, objective, methods, results, and conclusions. METHOD: Formalizing the categorizati...
متن کاملPathBinderH: a Tool for Sentence-Focused, Plant Taxonomy-Sensitive Access to the Biological Literature
Mining the biological “literaturome” promises significant advancements in genome annotation, literature access, curation support, and other applications. Standard tools allow users to identify scientific abstracts containing one or more query terms. In contrast, PathBinderH is a Webserved text mining tool that allows users to search PubMed (including MEDLINE) for sentences with co-occurring ter...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
دوره شماره
صفحات -
تاریخ انتشار 2002